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Abstract. 

A modified cluster analysis method has been developed to identify spatial 
patterns of planetary flow regimes, and to study transitions between them. This 
method has been applied first to a simple deterministic model and second to 
Northern Hemisphere (NH) 500 mb data. 

The dynamical model is governed by the fully-nonlinear , equivalent- 
barotropic vorticity equation on the sphere. Clusters of points in the model's 
phase space are associated with either a few persistent or with many transient 
events. Two stationary clusters have patterns similar to unstable stationary 
model solutions, zonal or blocked. Transient clusters of wave trains serve as 
way stations between the stationary ones. 

For the NH data, cluster analysis was carried out in the subspace of the 
first seven empirical orthogonal functions (EOFs). Stationary clusters are 




it- 


found in the low-frequency band of more than 10 days, and transient clusters^ 


the band-pass frequency window between 2.5 and 6 days. 


In the low-frequency band three pairs of clusters determine, respectively * Ji? 


EOFs 1, 2 and 3. They exhibit well-known regional features, such as blocking, 
the Pacific/North American (PNA) pattern and wave trains. Both model and low- 
pass data show strong bimodality. 

Clusters in the band-pass window show wave-train patterns in the two jet 
exit regions. They are related, as in the model, to transitions between 


stationary clusters. 
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^ 1. I ntroduction ^ 





It is well known that certain large-scale atmospheric circulation patterns 
persist for time intervals longer than those typical of midlatitude cyclones 
IBaur, 1947; Namias , 19823. A few of these patterns also have a tendency to 
recur from time to time. To identify the patterns which tend to both recur and 
persist, as well as determine preferred transitions between them, can deepen 
our knowledge of low-frequency atmospheric variability and enhance our skill 
in long-range forecasting (LRF). 

Recurrent and persistent patterns can be global, hemispheric or regional. 
Certain patterns associated with specific phases of the El Nino/Southern 
Oscillation are known to be global IRasmusson and Wallace, 19833. Most 
blocking episodes, in both the northern and the southern hemispheres, are 
regional in character [Dole, 1986; Dole and Gordon, 1983; Trenberth and Mo, 
19853. Typical of hemispheric patterns are those associated with the dominance 
of zonal wavenumbers three or four in the Southern Hemisphere [Mo, 1986; Mo and 


Ghil, 19873. 

In this article, we shall concentrate on recurrent and persistent patterns 
of hemispheric extent, associated with the atmospheric circulation in the 
Northern Hemisphere (NH) extratropics. These patterns will be studied first 
in the solutions of a greatly simplified dynamical model, and then in an 
atmospheric data set. 

One way to identify hemispheric patterns which persist is the pattern 
correlation method (PCM). A sequence of daily hemispheric weather maps is 
defined to constitute a persistent or quasi -stationary (QS) event, if the 
spatial correlation between any pair of maps within the sequence exceeds a 
given threshold p Q , say p q = O.S, and if the duration of the event so defined 
also exceeds a given threshold. Based on the ensemble-mean decorrelation time 
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of daily weather maps, typical duration thresholds for QS events are seven days 
in the Northern, and five days in the Southern Hemisphere. 

Using this criterion, Horel C198Sa] identified 58 OS events in a set of NH 
winter data. These events were not easy to classify subjectively into a small 
number of categories, due to the apparent diversity of their spatial patterns. 
In the Southern Hemisphere, Mo C1986] classified most of 23 QS events into 
three major categories by visual inspection: two were dominated by a planetary 
wave of zonal wavenumber three, but with nearly opposite phases, one by zonal 
wavenumber four. The question we are asking here is to what extent can purely 
objective, statistical criteria be used to classify QS events into a usefully 
small number of categories, and how can these categories, or flow regimes , be 
used in LRF. 

Recently the authors IGhil, 1987; Mo and Ghil, 1987] have considered 
systematic connections between the statistical and dynamical methods of 
description and prediction of QS events. They found that, both in the 
solutions of simple dynamical models and in atmospheric data sets, the first 
few empirical orthogonal functions (EOFs) had patterns similar to the most 
frequently occurring QS events. This could be explained by the fact that 
these EOFs pointed to the largest concentrations of invariant measure in the 
system's phase space, which were also the locus of the QS events. Such a 
result had to be expected from the ergodic theory of dynamical systems [Eckmann 
and Ruelle, 1985; Ghil and Childress, 1987, Sections 6.4 and 6.6; Ghil et al'. t 
1985, pp. 14-16], but the amount of specific information extracted for a complex 
system like the Earth's atmosphere appears rather gratifying. 

Still, the direct and exclusive use of EOFs in classifying QS events has 
two main disadvantages. First, the spatial orthogonality imposed on the flow 
patterns associated with each class, or flow regime, is an oversimplification, 
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from relative dynamical independence to complete lack of statistical correla- 
tion. Second, the spatial patterns of EOFs become successively more complicated, 
with smaller and smaller scales present as the variance associated with them 
decreases. Higher EOFs can therefore not be expected to resemble any large- 
scale QS events, nor classes of such events. 

The purpose of this article is to develop and apply an objective method for 
the classification of QS events into a few planetary flow regimes, and to 
examine transitions between these regimes. We develop a modified cluster 
analysis method and apply it to two types of data sets. One is obtained from 
extended integrations of a very simple, deterministic, nonlinear model of NH 
flow CLegras and Ghil, 198S; Ghil and Childress, 1987, Section 6.41. The other 
is a set of S00 mb geopotential height maps for NH winter. 

In Section 2, we describe the two data sets, and in Section 3 we present 
the method. Results are reported in Section 4 for the simple model and in 
Section 5 for the NH 500 mb data. Conclusions follow in Section 6. 

2. DATA SETS AND THEIR PREPARATION 

Model data 

Following the approach of Mo and Ghil [1987 3, we first develop and check 
our statistical methodology on a data set with a simpler structure, generated 
by a nonlinear deterministic model. To the extent that model solutions are 
time-dependent and actually aperiodic, they exhibit sufficient irregularity to 
justify a statistical treatment, as explained by Ghil [19873. 

The model is governed by the equivalent-barotropic form of the equation for 
the conservation of potential vorticity on a sphere [Ghil and Childress , 1987, 
Chapters 3 and 6; Legras and Ghil 

It has £ forcing by a zonal jet, Ekman dissipation and a simplified topography 


^19853, truncated to 25 spherical harmonics. 
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of zonal wavenumber two representing two equal continental masses and two equal 
oceans. Subsequent maps of the model's streamfunction fields use polar 

stereographic projection from the North Pole onto a full disk. The position 
of the model continents is indicated by heavy lines, on the periphery of this 
disk. The distribution of land masses resembles, albeit schematically, more 
that of the Northern than that of the Southern Hemisphere, and equatorial 
symmetry makes this essentially a NH model. 

In previous publications, the dependence of model behavior on various 

parameters was carefully investigated. Here it suffices to use one set of 

parameter values, which is both realistic [T. P. Barnett and J. 0. Roads, 

personal communication, 19861 and at the center of the region in parameter 

space where interesting solution behavior obtains. The forcing parameter p, 

giving the intensity of the zonal jet, is set to the value p ffi = 0.211, the 

-1 

dissipation parameter a to a value corresponding to the relaxation time of a = 
20 days, and the height of the topography to a nondimensional value of h Q = 0.1, 
relative to atmospheric scale height. 

For this value of the parameters, a model integration of roughly 65 years 
of simulated time was used. More precisely, this corresponds to a time interval 
of 8000 x, where the sampling time x equals 1.5 nondimensional time units, 
which is 3.0 days at p = 0.20 and 2.83 days at our value of p = p^. The first 
solution segment of 1000X was omitted so as to make the results independent of 
initial data. The time mean was computed by averaging over lOOlx i t i 8000x, 
and the streamfunction anomaly at a given time is defined as the deviation 
from this time mean. 

The persistence properties of model solutions for P = P m were discussed in 
Ho and Ghil [19871. Pattern correlations p(t+mx,t+nx> for pairs of maps were 
computed between each given time t and 5 consecutive sampling times after that 
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0<m<n$S. QS events were identified by requiring that the pattern correlations 

between all maps within a series of 6 or more be larger than or equal to p Q = 

0.5. If pit +nrt, t +mcl > p for 0$m<n<5, and pCt +t, t +6t] < p , then the 
v o o *o o o o 

event is said to last Just S sampling intervals. On the other hand, if 

p(t +mr, t +nt) > p for both (Km<n*5 and l$m<n$6, the QS event lasts 6 
o o*o 

intervals, and so on. Haps for all OS events were plotted. 

Most events were easily classified subjectively by their similarity to one 
or another of the model’s stationary solutions. All such solutions are unstable 
at this point in parameter space, but some of them generate persistent events in 
their phase-space neighborhood by a mechanism explained in previous publications 
te.g., Ghil, 1987, Fig. 15]. The events with longest durations had patterns 
resembling either blocking (Figures la,c) or a zonal type of flow (Figures lb,d). 
[Fig. 1 near here, please] 

From the time series of streamf unction anomaly coefficients, standardized by 
the variance in time of each coefficient, the correlation matrix was computed, 
and diagonalized by EOF analysis. In contradistinction from Mo and Ghil [1987], 
EOFs will be used here only for spatial filtering purposes. As explained in the 
Introduction, retaining only a small number of EOFs, those associated with the 
highest variances, will result in smoother large-scale fields, which presumably 
contain the signal of the system's variability. It is these filtered fields on 
which cluster analysis will be performed. 

Atmospheric data 

The data set consists of twenty years' worth of twice-daily SOOmb geopoten- 
tial height maps analyzed by the U.S. National Meteorological Center (NMC) from 
January 1963 to December 1982. Spectral analysis was used to remove the 
seasonal cycle at each grid point. The seasonal cycle is defined here as the 
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20-year mean plus the 20-th and 40-th Fourier components of the time series. 
Anomalies were then computed as differences between the data and this seasonal 
cycle . 

For greater conformity with the bulk of the existing literature on persist- 
ent anomalies in the NH extratropics, we concentrate in this article on the 
winter only. The winter season here is taken as the 120 days from November 15 
to March IS. 

Pattern correlations between pairs of maps one day apart were computed. QS 

events were identified by requiring that pattern correlations between the pairs 

of maps on 5 consecutive days be not less than O'. S. 

After detrending the time series of anomalies, we filtered this series in 

time by using separately a low-pass filter and a band-pass filter, as designed 

by Blackmon C19763. The low-pass filter has a large-amplitude response for 

-1 

frequencies 0 < f < 0.1 day , i.e., for periods 10 days < T < <*>. Due to the 

removal of the seasonal cycle and retention of winter data only, the 

variability in this window reflects in fact two separate bands: 10 days < T < 

100 days and 300 days < T < «. The band-pass filter is sensitive to frequencies 

-1 

0.17 < f < 0.4 day , i.e., for periods of 2.5 days < T < 6 days. The two 
filters have rather sharp cutoffs and little overlap, so that the two windows 
of low and intermediate frequency are well separated. Cluster analysis will be 
carried out separately for the low-pass filtered and band-pass filtered data. 

The spatial filtering started by retaining only data at 3S8 points out of 
NMC's 541-point NH grid. This grid of 358 points achieves a compromise between 
a regular latitude-longitude distribution and an, unfortunately inexistent, 
uniform-spacing distribution CBarnston and Livezey, 19871. Anomalies at 305 
points of this grid lying between 20N and 70N were standardized by the variance 
in time at each grid point, and the correlation matrix for the corresponding 



-9- 


time series was calculated. 

Table 1 gives percentages of variance for each EOF, separately in the low- 
pass and band-pass windows. The expected error in this estimate of variance, 
also given in the table, was evaluated by the heuristic formula of North et a 1. 
C19821 

1/2 

S\/\ = C2/NJ 

Here 6\ is the standard deviation for eigenvalue \ and N is the number of 
independent samples. We took N=200 for the low-pass filtered data and N=400 
for the band-passed time series. This is rather conservative, since the total 
number of samples is 2x120x20=4800, and the decorrelation time, as we shall 
see, is less than 10 days in the first band (Table 6) and about 3 days in the 
second. 

[Table 1 near here, please] 

Convergence of the EOF expansion is slow in both windows. For the band- 
pass window, 15 EOFs only give 43% of the total variance. In the low-pass 
window, seven EOFs give 50% of the variance, and they will prove sufficient for 
our analysis of low-frequency variability. The convergence for model data is 
much more rapid, due to their limited spatial resolution and simplified dynamics 
[compare Table 4 in Mo and Ghil , 1987]. 

3. METHODOLOGY 

Probability density estimation 

As indicated already in Sections 1 and 2, deterministic, but nonlinear 
dynamics can generate time series of geophysical flow fields with an appearance 
or randomness [Ghil et al., 1985], This randomness is associated, heuristically 
speaking, with a measure, or probability density function (pdf), which is 
invariant under the equations describing the dynamics. These equations are 
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also said to generate a flow in the system's phase space, as each point in 
this space can be thought of as moving, or flowing, in time along the unique 
orbit, or trajectory, passing through it C Ghil and Childress, 1987, Section 
S.41 . 

With this terminology, the pdf is said to be invariant under the flow. 

What is meant is simply that, if a set of points A in phase space is carried 

by the flow into a set B, then the measure, i.e., the total or cumulative 
probability, of the two sets is equal. For conservative dynamical systems, it 
is well known that such an invariant measure exists, is essentially unique and 

is just equal to the ordinary volume in the system's phase space. This result 

goes usually under the name of Liouville’s theorem. 

For the forced, dissipative systems one encounters in geophysical fluid 
dynamics (GFD), the situation is somewhat more complicated. The flow in phase 
space is volume-reducing, rather than volume-preserving, and tends in general to 
a strange attractor [Lorenz, 1963; Ghil and Childress, 1987, Section 5.41. A 
measure on such an attractor is known to exist under certain simplifying mathema- 
tical assumptions, called Axiom A, which essentially state that for every point 
on the attractor the linearization of the flow has no neutrally stable direc- 
tions. Requiring that the measure behave essentially like length along the 
unstable directions renders it also unique. Furthermore, this unique measure is 
ergodic for almost all points near the attractor, as well as on the attractor. 
That means that any physically or numerically observable time averages along 
trajectories starting on or near the attractor will be equal to the correspond- 
ing ensemble average with respect to the pdf on the attractor CEckmann and 
Ruelle, 1985, pp. 639-641, and references therein; Ghil et a 1., 1985, pp. 14-161. 

The main point of our line of investigation is that this pdf is far from 
being either uniform or isotropic in the phase space of large-scale atmospheric 
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flows. The most elementary form that such inhomogeneities can take is bi- 
modality. In the case of the relatively simple phase-space flow induced by our 
model (Section 2), such bimodality is clearly a result of the greatly enhanced 
persistence near the two unstable stationary solutions with blocking and zonal- 
flow patterns, respectively. The approximate form of the pdf near these two 
generalized saddle points in phase space can actually be derived from the linear- 
ization of the equations at these points, and a small number of scaling para- 
meters for the pdf can be determined from the data. 

For the NH data set no such a priori form for the pdf can be derived, and 
one has to use the tools of nonparametric estimation theory. This is a 
particularly active field of modern statistics, which relies on an intelligent 
and systematic use of computer power rather than on "cookbook" formulae valid 
only for elementary problems involving well-known, classical pdfs. The methods 
of nonparametric theory permit the reliable estimation of differences between 
mean and median of an arbitrary distribution C Efron, 19821 or of the multi- 
modality of a pdf [Silverman, 1986; Tapia and Thompson, 19781, based on samples 
of moderate size. 

The first nonparameteric method we used is discrete maximum penalized like- 
lihood estimation (DMPLE) [Silverman, 1986, Section 5.4; Tapia and Thompson, 
1978, Chapter 51, which estimates a univariate pdf a) = ojCzI, subject to a smooth- 
ness constraint. Some such constraint, or regularization, is necessary for any 
consistent, stable estimation from noisy data. It plays a role similar to lag 
windows and frequency tapers in spectral analysis. 

The likelihood function maximized is 

{-C-) 

(*> A ) L<( ^m)) = i ?f <Z /l ) exp {■ <a/h) £i 1 <y j" y j-l )2 ^ , <l3) 
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subject to 

m 

hi] y. = 1 ; y. * 0 j = 1, m = 1 (lb,c) 

j =1 J J 

Here = {y q » y^, •••! Y m } is an approximation to the true pdf w(z) at m + 1 

equally spaced "nodes", or mesh points, with y Q = y^ = 0 since w(z) is assumed 

to be zero outside the interval considered; h is the equal spacing between nodes, 

and s(z i > is the interpolation by linear splines, defined with respect to the m 

equal subintervals, of the n unequally spaced data z^, where n >, m necessarily 

r 2 2 

and usually n >> m. The sum in the exponent is an approximation to J(d w/dz )dz, 
and yields a smooth pdf estimate io (m) by minimizing the "wiggles" of w(z) [Tapia 
and Thompson, 1978, Chapters 4 and 51. 

The variable z chosen was a leading principal component of the data set of 

e 

intrest, as explained in Eq. (2) below. The maximization was carried out by the 

A 

algorithm NDMPLE from the International Mathematical Statistical Library 
(IMSL). This algorithm was also used by Benzi et al. [19861 and Sutera [19861 
on NH winter data for December 1980 - February 1984, who chose the sum of the 
squared amplitudes of zonal wavenumbers two, three and four as the unique 
variable z of their pdf oj(z). 

We took m = 40 and h = 0.2, so that the total interval over which uj(z) is 

allowed to be nonzero equals 8 standard deviations of the variable of our choice. 

The smoothness parameter a was chosen by requiring the discrepancy between the 

estimated pdf or .(a) and the theoretical limit pdf as a -» 0, co. .(0), which is 
(m) (m) 

an atomic measure concentrated at the m+i equidistant mesh points, to satisfy 
the Kolmogorov-Smirnov (K-S) test at a confidence level of 95% [ Darling , 1957; 
Sutera, 19861. The K-S test is distribution-free, i.e., it is independent of 


the shape of the pdf <jj(z) approximated, provided u is a continuous function of 
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z [Fisz, 1963, Section 10.11]. Robust approaches to an estimation of the 
regularization parameter a from the data involve various resampling plans, which 
are much more expensive computationally lElfron, 1982; Wahba and Wendelberger, 
1980]. 


Cluster analysis 

The major drawback of the DMPLE approach to pdf estimation in phase space is 
that its extension to more than one variable is still prohibitively expensive. 
Bimodality with respect to a single dimension is an important first step in 
identifying multiple planetary flow regimes. But much more detail is needed to 
use these regimes effectively as a foundation for LRF. 

We turned therewith to cluster analysis [A nderberg, 1973; Silverman, 1986, 
Section 6.2], which is a flexible multivariate approach. To classify QS events 
objectively by the similarity of their flow patterns, one needs a quantitative 
measure of similarity. In Legras and Ghil [19851 root-mean-square distance 
between maps was used to study the proximity of persistent events to unstable 
equilibria. Pattern correlations of anomalies correspond to the cosine of the 
angle, centered at the time mean, of two maps seen as points in phase space, 
rather than to their Euclidean distance. This measure is more sensitive to the 
meteorologically significant shape and phase of anomalies [ Horel, 1985a; Ho, 
1986; Ho and Ghil, 1987]. It was already used to identify QS events, and we 
used it for our cluster analysis. 

We expand the time series <b(x,t) of anomaly fields into EOFs 


<!>(x,t ) 

A* 


- u 

o 

E 

V = 1 


A <t> E (x) 

U V ~ 


( 2 ) 
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where x is the spatial coordinate, E y is the v-th eigenfunction of the correla- 
tion matrix, in decreasing order of the associated eigenvalues X [see Table 4 
in tlo and Ghil, 1987, and Table 1 here], A^ is the corresponding principal 
component [PC], and is the truncation of the EOF expansion selected for 
smoothing purposes CBarnett and Preisendorfer, 1978]. The pattern correlation 

between <J>(x, t ) and <j>(x, t ) , so truncated, is then given as 
m ~ n 


p<t , t > = E 

m’ n yti 


A (t ) A (t ) 
v m v n » 


(3) 


due to the orthonormality of the EOFs. 

When an anomaly (2) is small in magnitude, the pattern correlation between 
it and another anomaly cannot be expected to be meaningful. It was found 
useful therefore to define a cluster of small anomalies, for which the 
distance to the origin 


d(t ) 
n 


{ F 

V=1 


A (t 


„>} 


1/2 


(4) 


is below a given threshold d^. 

For a cluster C = $ n } whose elements <J>^ are anomaly maps <t»< x , t^ ), 

we define the center c as the arithmetic mean of its elements, 


- = ± E ♦. 

n j =i j 


c 


(Sa) 
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Each element is interpreted as a vector in ^-dimensional Euclidean space, 
<J>j = [a^ (t n ),..., (t^ )j , so that Eq. (Sa) is equivalent to c = £a^ , . . . , A^ j , 


with 


A 

v 



A (t ) 
u n . 

1 


(Sb) 


Clustering criteria 

All clustering algorithms require two basic criteria: one to determine 
membership in a cluster, otherwise each point would form a cluster; the other 
to determine separation between clusters, otherwise all points would form a 
single cluster. In general, these criteria are chosen so that each point 
belongs to one and only one cluster - hard clustering, or so that each point 
belongs to one or more clusters- fuzzy clustering IBezdek, 19811. 

In our application to large-scale atmospheric flows, it is quite clear 
from synoptic experience that sizable portions of phase space are visited only 
very rarely, so that considerable numbers of anomaly maps will be distributed 
quite thinly over these portions. There is no use in trying to associate 
these thinly populated portions of phase space with any planetary flow regime, 
as they are most unlikely to recur and will not help in any substantial way to 
either understand or predict low-frequency variability. Eliminating thus a 
considerable number of points from the clustering procedure will enhance the 
convergence rate of any specific clustering algorithm we choose. We depart 
therewith from other clustering approaches by formulating a third criterion, 
for non-membership in any cluster. Alternatively, this can be thought of as a 
criterion for membership in a special, larger cluster of nonrecurrent flow 
anomalies, into which the really interesting clusters are embedded. 

Recalling the other special cluster, of small anomalies, the five criteria 
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we use are : 

a) Membership criterion. The pattern correlation between the center of a 
cluster c and any element [in the clusterf 4>^\ should exceed a threshold r^, 


v 

o 

p(c, A.) = Xj A A > r. . (6a) 

J u=l u u 1 

Remembering the interpretation of p($',<J>") as the cosine of an angle in phase 
space, requiring for instance that r^ = 0.86 means that any two elements <j>' 
and <J>" in a given cluster form an angle smaller than 60° with the origin, 
i.e., that they correlate better than 0.5. We shall use r^ > 0.8. 

b) Separation criterion. The pattern correlation between the centers of 
two clusters, b and c, say, should not exceed a threshold r 2 > 

u 

o 

p(b, c) = Zj B C i r„ . (6b) 

v=i u u 2 

To prevent points from belonging to more than one cluster, we require that 
arccos r > 2arccos r . We shall use r i 0.45, which satisfies this require- 

« 1 L* 

ment for the lowest value of r. . 

1 

cl Exclusion criterion. If a map <j> does not correlate sufficiently well 
with the center c^ of any cluster, 

p(<|>, c R ) < r ± , (7a) 

and it does not satisfy the separation criterion for at least one cluster, c^ 

o 
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say, 

p<4>, c R ) > r 2 , (7b) 

then <{> belongs to the nonrecurring cluster. Direct observational evidence for 
extratropical flows in either NH winter CCharney et a 1., 19811 or SH winter 
t Trenberth and Ho, 19831 suggests that this cluster should take up about 2/3 
of all anomaly maps analyzed. 

d) Small-anomaly criterion. A map 4> < x , t ) belongs to the small-anomaly 

** n 

cluster, rather than to one of the significant clusters defined by (6a) or to 
the special, nonrecurrent cluster defined by (7), if its distance (4) to the 
origin satisfies 


d(t ) i d = d - 1.8 a, , (8) 

no d ’ 

where d is the mean distance of the time series of anomalies to the origin and 

a , is the variance of the distances about this mean IHo and Ghil, 19871. 
d 

e) Small cluster criterion. Clusters with less than L elements are 

o 

assigned to the special, nonrecurrent cluster. L q is taken as 25 for the 
model results and as 8 for the NH data. This accelerates the search and 
eliminates nonsignificant clusters. 

The schematic diagram of our clustering criteria is given in Figure 2. 
The exact search and clustering algorithm is given in Appendix A. 
CFig. 2 near here, please 1 


4. MODEL RESULTS 


Clusters 

For the time series of 7000 streamf unction anomaly maps based on 25 
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spherlcal harmonics, ten EOFs contain 91% of the total variance. Table 4 in 
Mo and Ghil [1987] shows that the first three, in fact, already contain 65%. 

Cluster analysis using ten EOFs, r^ = 0.8S and r^ = 0.4, yielded five 
clusters. Using 12 EOFs, and varying r^ between 0.8 and 0.86, with 0.34 4 r 2 
4 0.4S, yielded the same number of clusters. The flow patterns of their 
centers stayed much the same, only the number of elements in each cluster 
varied from one set of criterion values to another. 

The clusters for ten EOFs, r = 0.85 and r ^ = 0.4, are listed in Table 2, 
in decreasing order of the number of elements. The distribution of 
persistence for passages within each cluster is given. Flow patterns of the 
centers of each cluster are shown in Figure 3. 

[Table 2 and Figure 3 near here, please] 

Clusters 1 and 2 are largest, with about 11% of the total number of points 
each. They are also the most stationary, being the only ones with a significant 
number of flow sequences persisting for longer than four sampling times within 
the cluster. Cluster 1 (Figure 3b) resembles clearly the model's zonal flow 
(compare Figure lc), while Cluster 2 (Figure 3c) is associated with blocking 
(Figure Id). 

In Mo and Ghil [19871 we saw that the first EOF was nearly parallel to a 
line segment extending from the unstable blocking equilibrium E^ to the 
unstable zonal equilibrium B^. This is now fully explained by the closeness 
of the dominant Clusters 1 and 2 to the respective unstable equilibria. Both 
c and c 2 have indeed their largest components along EOF 1, with signs opposite 
to each other (Figure 4). 

[Figure 4 near here, please] 

The detailed distribution of persistence times in Clusters 1 and 2 shows 
that rapid passages through these stationary clusters are still the most 
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frequent. In general, we expect the persistence time in either cluster to be 
most strongly correlated with distance between the points of the sequence and 
the unstable equilibrium nearby, cf. Legras and Ghil C198S, Figures 10, 13 and 
16) . 

To verify this statement, we computed Euclidean distances between points 
in each flow sequence of a cluster, and the center of that cluster, c^ or c^, 
as the case may be. We took the minimum distance, d ^ n , corresponding to each 
sequence, and we averaged over all sequences with the same duration, in each 
cluster separately. The resulting values d min are listed, as a function of 
sequence duration, in Table 3. 

[Table 3 near here, please! 

The values of d mln are increasing in general as the duration of the 
associated sequences decreases: in Cluster 1 from 0.45 for 170x to 2.32 for 

2x, and in Cluster 2 from 0.82 for 34x to 1.72 for 2x. The increase is not 
perfectly monotonic, due to variations in the direction of approach to the 
unstable equilibrium and in the direction of ejection from its vicinity. But 

6*4 

the correlation between close passage and persistence is clearly excellent. 

A 

The pattern correlation method (PCM), as defined in Section 2, identifies 
only some of the most persistent passages as QS events. First, the passages 
have to last St or longer. Second, the membership criterion of r^ = 0.85 
allows correlations between pairs within a cluster to be smaller than p Q = 0.5. 
We shall return to a systematic comparison between the two approaches later in 
this section. 

Clusters 3 (Figure 3d) and 4 (Figure 3e) have similar anomaly patterns, 
but with nearly opposite phases. Cluster 4 resembles the wave-train pattern 
obtained by the correlation method in Ho and Ghil [1987, Figure 9c!. Table 4 
there indicates that this pattern has similarities with EOF 2, and Figure 4 
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here shows that c 3 and do have large components of opposite signs along 
this EOF, as well as along EOF 1. While EOF 1 is essentially determined by 
the two dominant clusters, the orthogonality constraint on EOF 2 prevents it 
from being uniquely determined by Clusters 3 and 4. 

Cluster S (Figure 3f) is both the smallest and the least persistent, with 
only three sequences lasting 2x and none longer. It resembles the model's 
unstable Zonal 2 equilibrium. All five clusters are well separated, the highest 
correlation between centers being p(c£, c,,) = 0.38. 

Correlating the mean $ of the time series with the anomaly maps of the 
centers of the clusters, c^, k = 1 , 2 ,..., 5, yields the largest correlation for 
k = 1, the zonal-flow cluster, p($, c^) = 0.73. The correlations between $ and 
all the other clusters are negative, and obviously smaller. This result could 
explain why certain quasi-stationary wave patterns in NH winter have sometimes 
been interpreted as amplifications of the climatology. 


Fuzziness 

To study flow sequences whose patterns are more or less constant, but slowly 
moving in physical space, we introduce a special concept of fuzziness. This is 
inspired to some extent by, but distinct from the classical fuzzy clustering 
algorithms ZBezdek, 1981]. 

The centers ^c^| of the clusters are kept fixed, but the number of points 
belonging to each given cluster C is increased by relaxing the membership 

K 

criterion to 


P ( *’ V > r 3 ’ r 3 < r l • 


(9a, b) 


This will allow a flow sequence with some maps already belonging to cluster 
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but having also maps containing a slightly displaced version of the main feature, 
e.g., a slowly retrogressing ridge, to belong entirely to the increased, fuzzy 
cluster. 

We note in passing that the method of complex correlations Ce.g., Horel , 
19841 has somewhat the same purpose, and we implemented a version of it. This 
version only captured those flow sequences for which motion of features is 
strictly zonal, and we renounced developing a version which would not exhibit 
this shortcoming. 

We used the fuzzy membership criterion (9), with r^ = 0.65. This is based 
on the requirement that correlations be statistically significant at a 95% level. 
For ten degrees of freedom, a simple algebraic transformation of the classical 
Student t- test yields a lower bound on significant correlations of 0.63 CFisz, 
1963, pp. 429-4301, hence r g = 0.6S, by rounding up, for our ten EOFs. 

The results for the fuzzy clusters, including now the small-anomaly cluster, 
cf. criterion (8), are given in Table 4. We concentrate on comparing the follow- 
ing characteristic times with those of the hard clusters in Table 2: T^ is the 

average duration of a passage in the cluster, while T is the average wandering 
time between exit from that cluster and entry into any other cluster. A sequence 
in a cluster is termed persistent if it lasts for five time units or longer, T d 
> St. Tp is the average duration of persistent sequences. 

[Table 4 near here, please 1 

* 

Table 4 shows that relaxing the membership criterion has led to a total 
number of elements in nontrivial clusters of 62% of all points, vs. 27% before, 
slightly more than the double. Clusters 1 and 2 are still dominant, and most 
persistent. The number of elements for them has increased the least, showing 
that they are intrinsically stationary and well separated from all other 
clusters. The numbers for the smaller clusters, 3, 4 and 5, has increased more 
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than threefold, indicating that they tend to contain sequences with slowly 
moving features, rather than stationary ones. 

The average residence time in Clusters 1 and 2 is = 11t, while for 
Clusters 3 through 6 it is = 2t. The wandering times are T^ = St and T^ = 
3. St, respectively. The wandering times are smaller for all fuzzy clusters 
(Table 4) than they are for the hard clusters (Table 2). This is due simply 
to the decrease in size of the diffuse, nonrecurrent cluster in the fuzzy 
formulation. 

The average persistence time in zonal Cluster 1 decreases from T - 21 . fix 

d A 

in Table 2 to T^ - 10. St in Table 4, while the aveage duration of persistent 

sequences goes from T ^ - 41t to T^ - 24t. A change in the opposite direction 

occurs for the blocked Cluster 2, with an increase of T d from 6t to 11t, and 

of T from 10t to 35. St, as cluster size increases due to fuzziness. This is 
P 

in agreement with the distribution of duration of persistent sequences given 
in Figure 17 of Legras and Ghil [1985] and Figure 16a of Ghil [1987], if we 
accept the fact that the fuzzy clusters (Table 4) include a larger number of 
passages of the trajectory not so close to the corresponding unstable equilib- 
rium. There are proportionately more such short events captured by an 


increase in cluster si^le for the zonal regime, as seen also direc(ycjy from the 

A 

two tables. 

Changing the fuzziness parameter r^ from 0.65 to 0.7 or to 0.55 yields 
smaller or larger numbers of elements in each cluster. But the relative 
stationarity of Clusters 1 and 2, and the transient character of Clusters 3 
through 6 remains the same . 


Cluster Analysis and QS Events 


To compare the results of the PCM method with those of cluster analysis, 
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let be the set of all maps belonging to a QS event and be the set of all 
maps belonging to one of the five nonexceptional clusters. Let 


T 1 = 


Q„ n Q 


(10a) 


be the set of elements belonging to both and Q , while 


T 2 = U ^2 

... 

1 


(10b) 


is the set of elements belonging to either or (or both) 


The ratio 


#<T > 

#<V 


(11) 


between the number #(T. ) of elements in T. and the number #(T_) of elements 

1 1 2 

in gives a measure of the compatibility of the two methods. For the fuzzi- 
ness parameter = 0.65, we find r = 0.S7. This is due to the large number 
of elements in Clusters 3 through 5 which are ipso facto in Q^, but not in Q^. 

We consider therewith Q^' as the union of elements in Clusters 1 and 2 
only, and define T^' and accordingly by replacing by Q^' in Eqs. (10a, b). 
The corresponding y ' = 0.71 is much larger than the previous value of y = 0.57, 
substantiating our designation of Clusters 1 and 2 as stationary, or persistent, 
while the other clusters are justifiably termed transient. 

Figure S shows the correlation and between the time series of 

anomaly maps <|>(x,t>, projected onto the first ten EOFs, and the centers c, , k 

** K 

= 1,2 of Clusters 1 and 2, respectively. The index Q(t), also shown, equals 1 
if the map is either in fuzzy Cluster 1 or in fuzzy Cluster 2, and 0 otherwise. 
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It is clear by inspection of this figure that all major QS events, and most 
minor ones, are either in Cluster 1 or in Cluster 2. 

[Fig. 5 near here, please] 

We can conclude our intercomparison between the PCM and cluster analysis 
with the following two remarks: 

1) Cluster analysis can identify both QS events and nonpersistent , but fre- 
quently recurring patterns. PCM, as used up to now, can only do the former. 

2) PCM can take into account slowly moving features better than cluster analy- 
sis. This is especially true for a few QS events which involve gradual transi- 
tion from one cluster to another, or exit and reentry into the same cluster 
during one QS event. 

Transitions between Clusters 

In Mo and Ghil C19871, we introduced the concept of a Markov chain of tran- 
sitions between planetary flow regimes, whose flow patterns were defined there 
by the PCM (see also Figure 25 in Ghil , [1987]). It turns out that a much better 
description and understanding of such a Markov chain obtains when basing 
multiple flow regimes on cluster analysis. 

Table 5 gives the number of transitions from one cluster to another for 
the six fuzzy clusters of Table 4. Clearly each flow sequence, or trajectory in 
phase space, passes through the diffuse, nonrecurring cluster. This is neither 
significant nor interesting and yet another reason to ignore the trivial cluster. 
Notice that the total number of transitions, 1030, is much smaller than the 
total number of elements in the six clusters, 4315, since we do not count 

two successive maps within the same cluster as a reentry. The ratio of these 
two numbers is simply T^ = 4.19x for Q^ (see Table 4). 

[Table 5 near here, please] 
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The transition matrix in the table is not symmetric, and not diagonally 
dominant. Except for Cluster 4, the largest entries in either row or column 
occur off the diagonal, indicating that reentry is much less likely than 
transition to another cluster. 

In spite of the considerable dominance of Clusters 1 and 2, their diagonal 
entries are among the smallest. Even more strikingly, there are no direct 
transitions between the zonal-flow Cluster 1 and the blocking Cluster 2. To 
go from a zonal-flow pattern to a blocking pattern, or vice-versa, the flow has 
to pass through at least one, though more typically two transient, but recurring 
patterns. This is shown in the "flow diagram" of Figure 6. 

CFig. 6 near here, please] 

In this graphic representation of our Markov chain, only those arrows have 
been drawn which correspond to a number of transitions larger than that given 
by equal probabilities, plus one standard deviation. Thus for instance there 
are 113 transitions out of Cluster 1, to one of six clusters, for an equal 
probability of 113/6 - 19 transitions. Hence only the reentry arrow, with 31 
transitions, and the arrow to the wave-train Cluster 4, with 42 transitions, are 
shown. The figure emphasizes that transitions both to and from Cluster 2 are 
likely only through Clusters S and 6, and that wave-train patterns with opposite 
phases account for transitions to and from Cluster 1, respectively. 

Blmodality 

As explained in Section 3, bimodality with respect to one variable is the 
simplest form that inhomogeneity of a pdf in phase space can take. Bistable 
solutions to simple models of planetary flow over topography were obtained by 
Charney and DeVore 119793, Hart 119793 and Pedlosky 119813. The relevance of 
bistability to low-frequency atmospheric variability was often interpreted to 



-26- 


stand or fall by the discovery of such bimodality in NH winter data. This is 
the approach taken in particular by R. Benzi, A. R. Hansen, P. Malguzzi, A. 
Speranza and A. Sutera in a number of recent publications [e.g., Benzi et al . , 
1986; Hansen, 1986; Speranza, 1986; and references therein]. 

The picture of multiple planetary flow regimes which emerges from this 
section and the following one is considerably more complex, and potentially 
more applicable to LRF, than simple bimodality. But it appears interesting to 
verify the existence and sources of bimodality in both our model and in our NH 
data set. 

Figure 7 exhibits the approximate pdf obtained for the first and second PC 
of our data, by using the algorithm NDMPLE explained in Section 3. The smooth- 
ness parameter in Eq. (la), obtained by applying the K-S test to yield a 
confidence level of 95 7., was a = 0.1. 

[Fig. 7 near here, please! 

The pdf of PC 1 (Figure 7a> is clearly non-Gaussian with the largest peak, 
or mode, near the mean and smaller modes at approximately +1 and -2 standard 
deviations. The position of these smaller peaks corresponds roughly to the 
projection onto EOF 1 of c^ and c^, the centers of the blocking and the zonal- 
flow clusters, respectively (compare Figure 4). The pdf of PC 2 (Figure 7b) 
is significantly skewed towards positive values, but is unimodal. 

The same is true of the pdf for PC 3 (not shown). This is due to the fact 
that Clusters 1 and 2 have small components along EOFs 2 and 3, while Clusters 
3 through 6 are smaller and their distributions project without significant 
discontinuities onto these EOFs. 

A more complete picture of the situation is given in the two-dimensional 
histogram of Figure 8. The points in our time series, projected onto EOFs 1 
and 2, with the axes standardized as in Figure 7, were counted in boxes of 
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0.2x0. 2 standard deviations. Two peaks on either side of the EOF 2 axis are 
clearly associated with Clusters 1 and 2, respectively. The peak near the 
origin is due mostly to the small-anomaly Cluster 6. 

[Fig. 8 near here, please] 

Bimodality thus results from the existence of two dominant, stationary 
clusters, associated with particularly persistent flow sequences. The 
presence of additional, more transient clusters, detectable by other means, 
will tend to blur this simple way of looking at multiple regimes. Still, 
these additional clusters, if statistically significant, can contribute to 
understanding, as well as predicting low-frequency atmospheric variability: 
they establish road posts along the preferred routes of transition between the 
most obvious and persistent flow patterns, such as blocked and zonal flow. 

5. NORTHERN HEMISPHERE S00 MB HEIGHTS 

Bimodality for Low-Pass Filtered Data 

Bimodality was first illustrated in a data set of NH 500 mb data for the 
winter of 1981-1982 by Benzi et al. C198S1 and for the four winters 1980-1984 
in other publications of the same group [Hansen, 1986; Sutera, 19861. They 
applied the NDMPLE algorithm to the univariate pdf obtained from their data 
set with respect to the nonlinear functional given by the sum of squares of 
the amplitudes of zonal wavenumbers two, three and four resulting from an 
average of the heights over latitudes 1SN to 75N. 

We start the detailed analysis of our data set of 20 NH winters (1963-1982) 
by obtaining smooth approximations of univariate pdfs with respect to (linear) 
projections onto EOFs 1, 2 and 3 of the low-pass filtered data (Figure 9). 
The sources of bimodality in PC 1 (Figure 9a > and of considerable skewness in 
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PCs 2 and 3 (Figures 9b, c) will then be further investigated by cluster analysis 
[Fig. 9 near here, please! 

The three leading PCs were each standardized and gridded as in Sections 3 
and 4. The respective values of the smoothness parameter a given by the 9S 7, 
confidence level of the K-S test are all three equal to 10. The light lines in 
Figures 9a-c indicate the results for the full set of low-passed data. While 
all three PCs show some measure of skewness, none is bimodal with any degree of 
significance . 

For the simple model, bimodality of the first PC resulted from the persis- 
tent sequences in the dominant Clusters 1 and 2. We are thus led to consider 
the distribution of QS events in NH winter data. These were studied already 
from a slightly different point of view by Dole and Gordon [1983! and by Horel 
C1985a,b!. In our data set, 522 days out of a total of 2400 daily maps fall 
within QS events, defined as in Sections 2 and 4. The approximated pdfs for 
this restricted data set are shown as heavy lines in Figure 9, in terms of the 
original leading EOFs. 

EOF 1 (Figure 9a) is now clearly bimodal, with excellent separation and a 
highly significant magnitude of the two peaks. Values of a both much larger 
and much smaller than the optimal one selected by the K-S test give the same 
bimodal picture. EOFs 2 and 3 (Figures 9b, c) are strongly skewed, but not sig- 
nificantly bimodal, as for the model (Figure 7). 

We thus conclude that persistent anomalies of NH winter flow have preferred 
locations in phase space. The total pdf of hemispheric flows is blurred, how- 
ever, by the more uniform distribution of transient sequences of maps connecting 
these locations. The total number of maps available did not permit us to 
obtain statistically significant multi-dimensional histograms, and we turn 
therewith to cluster analysis in order to clarify the situation further. 
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Clustering in the Low-Pass Filtered Data 

Our low-passed data set was projected onto the seven leading EOFs, which 
together represent 50% of its variance (see Table 1). Cluster analysis was 
carried out in this seven-dimensional space, using r^ = 0.82 and r^ = 0.34 for 
the hard clusters. Eight nonexceptional clusters were obtained and they are 
listed in Table 6 in decreasing order of the number of elements, with the 
position of their centers. 

CTable 6 near here, please! 

These clusters were enlarged by using r 3 = 0.65, while keeping their 
centers fixed. The persistence properties given in the table refer to these 
enlarged, fuzzy clusters. The ninth, small-anomaly cluster, obtained by 
criterion (8), was not enlarged and its properties are also given for complete- 
ness. 

Clustering calculations were repeated using nine EOFs, which account for 
about 60% of the variance of the time series. Parameters were varied in the 
ranges 0.8 ( r^ ( 0.83 and 0.34 ( ( 0.38 for the hard clusters. Clusters 1 
through 6 were all reproducible, with 7 and 8 being less stable. 

Given larger data sets, techniques for objective estimation of clustering 
parameters from the data can be formulated, as for the univariate DMPLE pro- 
cedure. For the limited set at hand, the verification of the results lies 
mostly in the dynamical and climatological interpretation of the flow patterns 
obtained by cluster analysis and their relationship to NH patterns obtained by 
other methods. 

The flow patterns associated with fuzzy Clusters 1-6 are shown in Figures 
lOa-f, respectively. For each cluster, the figure shows the S00 mb height 
field obtained by averaging the filtered anomaly maps over all elements in the 



-30- 


e 

cluster. The plot thus shows the true centr of the fuzzy cluster, as opposed 

A 

to that of the hard cluster used initially. 

[Fig. 10 near here, please] 

To assess the statistical significance of the features in the figure, we 
calculated the standard deviation of the time series of anomalous heights at 
each grid point, a(x). The number N of independent samples for each cluster 
was estimated, rather conservatively, by the total number of days spent in 
that cluster, divided by 10 days. The latter is considerably longer than the 
mean duration of each sequence in any cluster, T^ (see Table 6), so that we are 
looking essentially at N independent passages through each cluster. Assuming a 
normal distribution of anomalies at each grid point, we used cr(x) and N to 
determine the points at which the mean anomaly value was different from zero at 
the 95% level of significance. Areas for which this statistical significance 
criterion is satisfied are shaded in Figures lOa-f. 

Cluster 1: wavenumbei — three pattern (Figure 10a). This cluster is the 

largest as a hard cluster (130 maps) and second largest as a fuzzy cluster 
(301 maps). It has a clear zonal wavenumbei — three pattern. The anomaly map 
resembles in the Pacific sector very closely the one-point correlation map for 
the base point (55N, 115W) , called the Pacific/North American (PNA) pattern by 
Wallace and Gutzler [19811. In the complementary 180 degrees of longitude it 
resembles the wave train called the Eurasian teleconnection pattern by the 
same authors. 

The average residence time in this cluster is T^ = 6.5 days, and the 
wandering time once the flow leaves this cluster and enters another one is T y 
- 8 days. There are 13 persistent sequences in this cluster, with an average 
duration of T^ = 10 days, and they are rather evenly distributed throughout 
the data set. But the hard part of the cluster was much better represented 
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during the latter ten winters of the time series. 

Cluster 2: reverse wavenumber-three pattern (Figure 10b). This cluster 
is second largest according to hard criteria (114 maps) and third largest 
according to the fuzzy criterion (278 maps). Zonal wavenumber three is as 
prominent as for Cluster 1. Each anomalous high of Cluster 1 is matched by a 
low of Cluster 2, and vice-versa. But the features are not merely of opposite 
sign: they are all slightly displaced and distorted. 

The North Pacific high in Cluster 2 is much more elongated and flatter 
than the Aleutian low in Cluster 1. The Western Canadian low is much smaller 
and weaker in Cluster 2 than the central high of the PNA pattern in Cluster 1. 
Finally, the North European high in Cluster 2 is much larger and stronger than 
the Scandinavian low in Cluster 1. 

Clearly the dominant climatological effect associated with Cluster 1 is 
the extensively studied Pacific influence on North America. The dominant 
regional feature associated with Cluster 2 is the wave train teleconnecting 
the Eastern United States over Greenland to Northern Europe IDickson and Namias, 
19761. 

Table 6 shows that the centers c. and c_ of the two clusters have the 

1 2 

largest components, of opposite sign, along EOF 1. The situation is thus 
quite similar to Clusters 1 and 2 in the model (Section 4) and to the quasi- 
stationary regimes found by the PCM for the Southern Hemisphere by Mo and Ghil 
C19871 . 

There are 12 persistent sequences in this cluster, with an average duration 
of Tp = 8.S days, and they are evenly distributed throughout the time series. 
The hard part of the cluster is quite reproducible using first one half of the 
data set, then the other. 
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Cluster 3: wavenumber-two pattern (Figure 10c). This cluster is third 
largest as a hard cluster, but largest as a fuzzy cluster (323 maps). Zonal 
wavenumber two is dominant. Positive anomalies cover Canada and most of the 
North Atlantic north of 30N. A small low is centered over the Western 
Mediterranean and a large high is centered on the Ural Mountains. This 
feature is reminiscent of the regional anomaly pattern centered on the Northern 
Soviet Union (NSU) studied by Dole C19861. Finally, a moderately large Aleutian 
low lies above a small high in the Central North Pacific, indicating zonal flow 
in the Pacific sector [White and Clark, 197S1. 

There are 12 persistent sequences in this cluster, with an average duration 
of T p = . S days. The cluster is reproducible in both halves of the data set 

Cluster 3 appears to contain most of the interannual variability of the NH 
wintertime circulation. Instead of defining the seasonal cycle as in Section 
2, we took out separately for each year in the data the mean of that year, as 
well as the annual and the semi-annual component (365 days and 182.5 days) of a 
Fourier expansion for that year, at each grid point. The clustering computa- 
tions were then repeated for the anomaly maps so defined. 

The composite anomaly map for Cluster 3 obtained in this way is given in 
Figure 11. All the features are much weaker, and hardly any of them are 
statistically significant, when compared with Figure 10c. Similar comparisons 
for the other clusters show only insignificant differences. 

[Fig. 11 near here, please! 

Cluster 4: double blocking (Figure lOd). This cluster is fourth largest 
as a hard, and fifth as a fuzzy cluster. Like Cluster 3, it is dominated by 
zonal wavenumber two, with a general aspect of phase opposition to the previous 
cluster. This is related to the two clusters having large components of 
opposite sign along EOF 2 (see Table 6). Thus EOF 2 is largely determined by 
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the presence of Clusters 3 and 4, in the same way that EOF 1 is determined by 
Clusters 1 and 2. 

Cluster 4 shows strong north-south oscillations in both the Pacific and 
the Atlantic sectors. In fact, their concomittant appearance might be related 
to the zonal ly-symmetric seesaw in sea-level pressures first noticed by Lorenz 
[19511 and discussed in the context of the North Atlantic oscillation (NAO) 
and the North Pacific one by Wallace and Gutzler [19813. 

The features In the Atlantic sector resemble strikingly the Greenland - 
Northern Europe seesaw of van Loon and Rogers [19761. In the Pacific sector, 
there is a dominant high centered on the northeastern tip of Siberia, 
accompanied by deep lows over both the Okhotsk Sea and the Central North 
Pacific. The former is associated with the Western Pacific teleconnection 
pattern of Wallace and Gutzler [19811. Some of these patterns, especially NAO, 
have also been put in evidence by the rotated EOFs of Barnston and Livezey 
[1987], Indeed, rotated EOFs have greater liberty to point at clusters in 
phase space, being less inhibited by orthogonality constraints. Oblique, or 
target rotation, should point even more accurately at clusters, orthogonality 
being replaced by statistical independence. 

Cluster 4 is also reproducible in both halves of the data set. There are 
eight persistent sequences, with an average duration of T^ - 9.S days. In 
fact, 75 days, out of a total of 86.5 days spent in the cluster, belong to per- 
sistent sequences, the largest fraction of any cluster. 

All eight persistent sequences have pronounced ridges in both the Pacific 
and the Atlantic Oceans. Many intense, high-latitude double-blocking cases, 
such as those in the winters of 1963, 1968, 1977 and 1980, belong to this 


cluster . 
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Cluster 5: wave train (Figure lOe). This cluster is fifth by hard 
criteria (76 maps) and fourth by the fuzzy criterion (261 maps). It shows a 
strong high in the Gulf of Alaska, with flow parallel to the Rocky Mountains, 
as discussed by Wallace and Blackmon [1983, Figure 3.16]. 

The rest of the anomaly map is taken up by one huge wave train of 
alternating positive and negative anomalies, extending from off the south- 
eastern coast of the United States across Eurasia, to Eastern Siberia. The 
strongest feature in the wave train is the Icelandic low, with the second 
strongest being another low over the Ural Mountains. The dipole over the 
Western Atlantic resembles the pattern given that name by Wallace and Gutzler 
[19811, and the entire wave train resembles their difference of composites for 
the positive and negative phases of the Western Atlantic teleconnection [their 
Figure 21c] . . 

The center c,. of this cluster has largest components along EOFs 1 and 3, 
with signs opposite to those of Cluster 3. This is reflected in the dominant 
spatial features, south of Alaska, south of Greenland and over the Urals, 
having opposite signs for Clusters 3 and 5. But we saw that Cluster 3 has 
still mainly a wavenumber-two pattern, while wavenumber four is dominant here 
and for Cluster 6. 

There are nine persistent sequences in this cluster, with T^ = 8 days. Of 
these, three are in the first half of the data set, and six in the second half. 

Cluster 6: PN A pattern (Figure lOf). This is the last statistically 
significant cluster. It has a striking PNA pattern in its negative phase, as 
defined by Wallace and Gutzler [1981, Figure 17]. There are six persistent 
sequences, with T^ = 10 days, and five of them correspond to blocking in the 
central Pacific, as described by White and Clark [19751. This cluster is also 
reproducible in both halves of the data set. 
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The features in the sector from 60W to 120E are weak and not statistically 

significant. This contrasts with Cluster S, where features in this sector are 

very prominent. Large components of c„ and c, with opposite signs along EOF 3 

5 o 

show that this pair of clusters contributes decisively to this EOF. The 
partial localization of spatial features for this pair is analogous to that 
observed for Clusters 1 and 2, determining EOF 1, and to that for Clusters 3 
and 4, determining EOF 2. 

The features in the Pacific sector resemble very well the positive 
composite of locally-defined height anomalies for Pacific (PAC) reference 
points of Dole 11986, Figure lal . We shall return to the complementarity of 
the different ways of viewing persistent anomalies, spatially, spectrally, 
through EOFs and through teleconnection patterns, in our concluding remarks. 

Transitions between Low-Pass Clusters 

From Table 6, it is clear that the relative fraction of maps in all 
clusters, 41 2 , is much less than for the simple, low-resolution model of 
Section 4 (see Tables 2 and 3). As a consequence, the average time spent by 
the atmosphere between clusters, of T^ = 9.5 days, is almost twice as long as 
the time spent in the clusters, of T^ - 5.5 days. This is essentially due to 
the much larger number of degrees of freedom for the atmosphere's low-frequency 
variability. In fact, it is both surprising and encouraging that the ratio 
T^/T^ is not any larger than obtained here. 

Figure 12 shows each of the correlations R^(t) between the time series of 
NH 500 mb anomaly maps, filtered in time and space as indicated, and the 
cluster centers c., k = 1,2,..., 8. Also plotted is the indicator function 

K 

Q(t) of the set of QS events, i.e., Q(t) = 1 if <|>(x, t) is part of a QS 
event, and Q(t) = 0 otherwise. Visual inspection of the figure clearly 
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indicates that all major QS events are associated with passage through a 
cluster . 

CFig. 12 near here, please] 

Following Section 4, we calculated from the obvious counterpart of Eqs. 
(10, 11) y = #(T.)/#(T 2> = 0.42. As expected from atmospheric behavior's 
being more complex than that of a simple model, y here is lower than the value 
of 0.S7 for the model. On the whole, this still indicates good agreement 
between the PCM and cluster analysis in identifying preferred flow regimes. 
The discrepancy results on the one hand from QS events not being entirely con- 
fined to clusters, although at least some of the successive maps of a QS event 
typically belong to a cluster. On the other hand, considerable numbers of 
points in each cluster do not belong to any QS event. 

For the model, many long persistences are associated with particularly 
close passages of the trajectory by an unstable equilibrium. To determine 
whether this is actually the case for large-scale atmospheric flow, models 
with much higher spatial resolution and greater physical realism than that of 
Legras and Ghil C1985] need to be analyzed with the same degree of care and 
detail. This is entirely possible on existing supercomputers and we expect to 
carry out such analyses in the near future. 

Table 7 gives the transition matrix for the modified Markov chain of NH 
low-pass filtered variability, as sampled from our data set of 20 winters of 
120 days each. Comparison of Tables 5 and 7 here with Tables 5 and 8 in Ho 
and Ghil [1987] shows an important advantage of cluster analysis over the PCM 
method of identifying preferred regimes: the number of transitions here is 
considerably higher, providing a much larger, although still insufficient 
sample for a stable estimate of the true transition probabilities between 
regimes. As explained in Ghil [1987] and Mo and Ghil [1987], the only way that 
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stable, reliable estimates of such a transition matrix can be obtained in the 
near future is by careful and extended experimentation with general circula- 
tion models. 

[Table 7 near here, please] 

Thus Table 7 cannot be used for specific LRF predictions as yet. But it 
can be used for general guidance as to how LRF might proceed in the not-too- 
distant future. 

As in Table S, we notice that diagonal elements are generally small, i.e., 
reentry into the same cluster is rather unlikely. This confirms in a sense 
that our fuzzy definition of the clusters is quite appropriate: smaller 
clusters would show more reentries, and so would much larger clusters. 

The matrix is far from symmetric: preferred paths are in evidence. 
These are illustrated in Figure 13. As in Figure 6, only those arrows are 
drawn which correspond to a probability of transition significantly higher 
than given by equal chances. 

[Fig. 13 near here, please! 

The small-anomaly Cluster 9 plays a role of crossroads even more important 
than for the model. Trajectories exiting from Clusters 1, 2, 3 and 5 have a 
relatively high likelihood of passing through Cluster 9 before continuing to 
Clusters 1, 2, 3 or 7(?). As explained in our previous publications, this 
does not indicate that the clusters represent certain linear instabilities of 
the time-mean flow, but rather that the time mean happens to lie close to the 
point where the boundaries of several attractor basins touch, permitting slow 
transitions between dominant clusters [Grebogi et al . , 1983; Ghll and Childress, 
1987, Sections 6.4 and 6.63. 

In the atmosphere, certain higher-frequency phenomena, not represented in 
the equivalent-barotropic model of Section 4, also play a role in the 
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transitions between relatively stable, stationary clusters. To begin undei — 
standing this role, we turn to the band-pass window of variability. 

Band-Pass Clusters 

We used six EOFs, r^ = 0.82 and r^ = 0.36 to perforin cluster analysis on 
the band-passed anomaly maps. Six EOFs provide only 24.2% of the variance in 
this window. But there is an obvious discontinuity at this level in the 
variance spectrum, from 3% to 2.5%, which is statistically significant by the 
rule of thumb of North et a 1. C19821 (see Section 2). Calculations were 
repeated with eight and nine EOFs and with different values of r^ and r^. The 
results below were essentially unchanged. 

There are seven distinct clusters, including one of small anomalies. The 
number of elements in the hard clusters varies from 6S to 94 maps. Using a 
fuzziness criterion of r^ = 0.65, the augmented clusters range from 221 to 360 
elements. The size of the clusters varies much less than in the low-passed 
data, so we arrange them by flow patterns, rather than size. 

Figure 14 shows the mean anomaly maps of the six nonexceptional band-pass 
clusters, as for the low-pass window (Figure 10). In agreement with the results 
of Blackmon et al. C1984a, b] , all clusters in this window (called in the latter 
articles "short time scales", as opposed to "intermediate time scales" of 10 to 
30 days, and "long time scales" of more than 30 days) show essentially wave 
trains elongated in the meridional direction, propagating zonally with a wave- 
number of seven or eight. 

[Fig. 14 near here, please! 

Clusters 1 and 2 (Figures 14a, b) have a well-defined wave-train structure 
in the jet exit region over the Eastern United States and the Western Atlantic. 
The two clusters are distinguished from each other by their wave trains being 
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roughly in quadrature. 

Clusters 3 and 4 (Figures 14c, d) have the most pronounced features of 
their wave train over the Western Pacific. The same quadrature of phase 
obtains as for the Atlantic clusters. Clusters 5 and 6 (Figures 14e, f) have 
a well developed wave train over both oceans, being only weaker over Eurasia, 
but the wave activity is still strongest in the Atlantic jet exit region. 

The spatial localization of baroclinic wave activity is a topic of 
considerable recent interest C Brevdo, 1987; Herkine, 1977; Pierrehumbert, 
19861. Our clustering procedure detects this localization and avoids 
yielding arbitrarily close successive phases of the synoptic-scale waves by 
the separation criterion of c^) < r_ = 0.36 between centers of clusters. 

The role of band-pass clusters in transitions between low-pass clusters is 
obviously important, and will form the object of a subsequent paper. We 
expect them to serve as way stations on preferred transition paths within what 
appears merely as a diffuse, thin cloud of points in the low-pass, mostly 
barotropic window of variability. 

6. CONCLUDING REMARKS 

An Approach to Long-Range Forecasting, and a Simple Model 

Our study of low-frequency atmospheric variability is guided by the 
practical concerns of long-range forecasting (LRF). Due to the well-known 
limits on detailed, pointwise predictability, one cannot expect to predict 
local weather with useful accuracy beyond 10 days, say, in a manner uniformly 
valid over all atmospheric states. 

The best hope therewith for LRF is that certain large-scale atmospheric 
flow patterns are more persistent than others, that these patterns fall into a 
few identifiable classes, or flow regimes, and that these regimes exhibit well 
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defined transition probabilities from the one to the other. The theoretical 
question is then to find how atmospheric dynamics generates these regimes, and 
their preferred transition patterns. The practical question is to extract 
from existing data and models the quantitative information on regime identifica- 
tion, expected duration and most likely successor. In the present section, we 
summarize our results in this perspective and sketch some promising directions 
for necessary research. 

The first step on the proposed road to LRF is identification of multiple 
regimes. We prefer the term planetary flow regime to the earlier "weather 
regime" of Reinhold and Pierrehumhert C19823, since weather is precisely what 
does not persist and cannot be predicted. It clearly plays a role in maintain- 
ing the large-scale, low-frequency flow patterns, but what this role might be is 
one of the more difficult questions of the whole field of LRF [Wallace and 
Blackmon, 1983, pp. 89-90]. 

To identify these regimes, we developed a modification of standard cluster 
analysis methods. This modification takes into account well-known features of 
low-frequency atmospheric variability, not present in other applications of 
cluster analysis, and enhances the convergence of classical algorithms. We 
chose a hard clustering algorithm, based on pattern correlations as a measure 
of distance between points in phase space. The modification introduced allows 
for a thin cloud of non-classif ied points in which the clusters are embedded, 
and for a special cluster of small anomalies, based on Euclidean, or root-mean- 
square, distance between each point and the grand mean of all points in the data 
set. Our modification further enlarges the nonexceptional hard clusters so 
obtained by a fuzziness criterion, admitting points with a preset correlation, 
lower than that used in hard clustering, to the fixed centers of the hard 
clusters. Reasonable changes in the values of the clustering parameters did not 
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change our results in any substantial way. 

This clustering algorithm was first applied to the time series of stream- 
function fields produced by an equivalent-barotropic quasi-geostrophic model 
with simplified Northern Hemisphere (NH) topography, zonal jet forcing and 
Ekman dissipation LLegras and Ghil, 1985]. The fields were spatially filtered 
from 25 spherical harmonics to ten empirical orthogonal functions (EOFs). This 
model time series is 65 years long, providing much higher statistical signifi- 
cance than available atmospheric data sets, and has the further advantage that 
the sources of low-frequency variability are well understood. 

Six stable clusters were obtained, including that of small anomalies. 
They make up 62% of the data, leaving 48% for the diffuse, trivial cluster. 

Clusters 1 and 2, in order of size, resemble the model's unstable equilibria 
termed Zonal 1 and Blocking. They contain the most persistent sequences, due to 
close passages near these equilibria. The first EOF is very nearly parallel to 
the straight line segment passing through the centers c^ and c^ of these two 
clusters. 

Clusters 3 and 4 resemble opposite phases of the wave-train pattern also 
detected by the pattern correlation method (PCM) for quasi-stationary (QS) 
events in Mo and Ghil [1987]. They are less persistent and determine, subject 
to the usual orthogonality constraint, the direction of EOF 2. Cluster 5 in 
size is also the least persistent, and resembles yet another unstable equilib- 
rium of the model, Zonal 2. EOF 3 points in the direction of this cluster, with- 
in the subspace orthogonal to EOFs 1 and 2. 

The projection of the sample probability density function (pdf) of this 
model solution onto EOF 1 gives a univariate pdf which is clearly bimodal. 
The two modes are produced by the persistent sequences in Clusters 1 and 2. 
Univariate pdfs along EOFs 2 and 3 are strongly skewed, but unimodal. 
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We computed the transition matrix for the modified Markov chain whose states 
are defined by the six nontrivial clusters, ignoring the diffuse one. This 
matrix shows few reentries into any cluster. Transitions between the two domi- 
nant clusters occur preferentially through Clusters 4 and S, in one direction, 
and through Clusters S and 3, in the other. The small-anomaly cluster is close 
to the boundary of the attractor basins of Clusters 1 and 2, and is also on one 
preferred path from Cluster 2 to 1, via Cluster S and a wave train. 

The second and third steps in defining our LRF procedure are determining 
the expected residence time in each regime, and the most likely successor to 
each regime. These two steps have been taken above for the model. The fourth 
and fifth are a dynamical explanation of the results for the first three steps, 
and a practical verification. 

For the model, we understand the dynamic origin of Clusters 1, 
related to close passages of the time-dependent solution by unstable equilibria 
with many directions of stability and few directions of instability. The size 
and mean residence time for these clusters are determined by the relative 
stability of the respective equilibria. 

Cluster 6 is given by the slowing down of trajectories close to a compli- 
cated basin boundary, and this also explains its role in transitions between 
Clusters 1 and 2. The nature of Clusters 3 and 4, and their role in transitions 
between the dominant clusters, is more speculative. Their wave-train nature 
suggests a phenomenology similar to certain standing or slowly-moving Rossby 
waves in the atmosphere. But the phase-space structures associated with these 
waves require further elucidation, and we expect to do this in a more detailed 
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and realistic model. 
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Clusters of Lov-Frequencjf Atmospheric Variability 

A 

The fifth point in our LRF procedure, actual verification, only makes sense 
for the atmosphere itself. We have carried out therefore steps one though three 
of the proposed LRF procedure for a data set of 500 mb geopotential heights from 
20 NH winters, January 1%3-December 1982. In the atmosphere, relatively low- 
frequency, mostly barotropic flow structures coexist with intermediate-frequency, 
largely baroclinic waves. This data set was hence separated into a low-pass and 
a band-pass window by suitable filters [Blackmon, 19761 . 

The low-pass variability, of ten days and longer, was spatially filtered by 
projection onto seven EOFs, and the band-pass data by projection onto six EOFs. 
The low-pass data exhibit seven stable clusters, including that of small anoma- 
lies. The clusters make up only 41% of the data, compared to 62% for the 
model's clusters. This percentage of recognizable clustering is clearly lower 
in the atmosphere due to the additional degrees of freedom, but is still quite 
encouraging, and suggests that we might be on a promising road to LRF indeed. 

Clusters 1 and 2 have a wavenumber-three pattern, with nearly opposite 
phases. They determine together the direction of the first EOF. Cluster 1 
shows the extensively studied Pacific influence on North America, Cluster 2 a 
similar influence of the Atlantic on Northern Europe. 

Clusters 3 and 4 are dominated by zonal wavenumber two. Subject to the 
well-known orthogonality constraint of classical EOFs, they determine together 
the second EOF. Cluster 3 represents most of the interannual variability in 
the data, has zonal flow over the Pacific, and a blocking high over the North- 
ern Soviet Union. Cluster 4 exhibits a marked blocking pattern over both the 
Atlantic and the Pacific ocean. 

Clusters 5 and 6 contain a more complicated distribution of waves, and have 
largest projections of opposite signs onto EOF 3. Cluster 5 has a Western 
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Atlantic dipole and a wave train over Eurasia, Cluster 6 a very pronounced 
Pacific/North American <PNA) pattern. 

The result most unexpected by, and therefore most interesting to us, is 
the localization of features exhibited by these six clusters. While all 
patterns are hemispheric, there is a clear tendency for each cluster within a 
pair to show larger and more significant features in one of two quadrants. 
These quadrants, which could be called eastern and western, or Atlantic and 
Pacific, are separated by the polar great circle composed of, roughly 
speaking, the 60W and 120E meridians. 

The PNA pattern in Cluster 1 is much stronger than the Eurasian wave train, 
while in Cluster 2 the Western Atlantic - Greenland - Northern Europe tele- 
connection dominates. In Cluster 3 the most important feature is the positive 
anomaly over the Urals, while in Cluster 4 it is the one over the Bering Sea. 
Finally, and most strikingly, the PNA pattern in Cluster 6 is clearly 
complementary to the wave train trailing off the Western Atlantic dipole in 
Cluster S. 

The regional, rather than hemispheric character of many persistent anomalies 
is subjectively well known to classical, synoptic-statistical practitioners of 
LRF. It provided the basis for the local definition of anomalies in the work of 
Dole [19861 and for the teleconnection approach of Wallace and Gutzler [19811. 
The interest of the present result is that we did not build this regionality 
into our search for preferred patterns, but obtained it objectively and quite 
•independently of the search procedure. It follows that partial regionality, 
with weaker hemispheric concomitants, is indeed a fact of large-scale atmos- 
pheric life. 

The reasons for this sectorial confinement of low-frequency variability 
have been studied by Held [19831, among others. Essentially, the propagation 
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speed of atmospheric features has to compete with the dissipation of their 
energy. Taking a heuristic 10 ms for the order of magnitude of the zonal 
propagation velocity of the energy through a stationary or slowly-moving wave 
train, and 10 days for the order of magnitude of the dissipation time, one 
obtains about 100 degrees of longitude for the spread of a locally-generated 
low-frequency disturbance. The limits of the sectors indicated above strongly 
imply that the two jet exit regions over the western part of the NH oceans are 
a major localized source of energy for low-frequency variability, in agreement 
with the suggestions of Green [ 1977], Kalnay-Rivas and Herkine [19811, and 
Shutts [19831. It is both difficult and necessary to reconcile this point of 
view with the spatially global one of resonant reinforcement between the flow 
in two sectors, i.e., what wags the jet whose exit wags a wave train? 

Hence our localization result raises more theoretical questions than it 
answers. But from the point of view of describing, rather than explaining low- 
frequency variability, it is rather gratifying: the success of the local 
approaches of Dole and Gordon [19831 and Wallace and Gutzler [19811 appears to 
be less surprising, and not at all at odds with a global view of atmospheric 
dynamics. It also helps explain the fact that varimax orthogonal rotation of 
EOFs, which favors a priori regional patterns LHorel, 1981; Barnston and 
Livezey, 19871, tends to produce patterns similar to those of the clusters 
here. Rotated EOFs are simply less inhibited by orthogonality from pointing at 
the natural clusters of low-frequency variability, and obliquely rotated EOFs 
would essentially point straight at them, with all their intrinsic regionality. 

Bimodality and Transitions between Regimes 

Univariate pdfs in the first three principal components (PCs) of the low- 
passed NH winter data are noticeably skewed, but unimodal. Model results 
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showed that bimodality in the first PC is induced by the highly persistent 
flow sequences associated with Clusters 1 and 2, respectively. Restricting 
the NH atmospheric data to persistent sequences only shows strong bimodality 
of the first PC, with very high statistical significance. 

The bimodality in this case is produced by sequences in Clusters .... for 
the "Pacific" phase of EOF 1 and by sequences in Clusters . . . for the "Atlantic" 
phase. We notice that separate sequences with large components of zonal wave- 
number two, three and four play a role in producing both maxima of the 
univariate pdf with respect to the first PC. This result provides a view of 
bimodality in low-frequency atmospheric variability which is complementary to, 
and somewhat more complex than that of Benzi et al. [19861. 

Our view of multiple planetary flow regimes via more than two clusters is 
possibly closer to synoptic experience, and hence richer in its promise for 
LRF. The transition matrix for the Markov chain of seven clusters, including 
that of small anomalies, provides useful qualitative information. The small 
number of reentries supports our choice of cluster size, and shows the clusters 
to be well separated. 

Preferred paths between clusters are in evidence. The small-anomaly cluster 
plays an important role on some of these, indicating its position on the 
boundary between attractor basins of several clusters. 

Additional transitions between low-pass clusters are likely to be associated 
with preferred patterns in the band-pass window, of 2.5 days to 6 days, roughly 
speaking. There are six nonexceptional clusters in this window, all showing 
meridionally-elongated wave trains with zonal wavenumbers of seven or eight. 
One pair of clusters is associated with an obviously baroclinic wave train of 
this type in the Atlantic jet exit region, the second pair has its strongest 
features in and downstream of the Pacific jet exit, the last pair has signifi- 
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cant features in both these regions. The wave train of one cluster within a 
pair is in phase quadrature with the other. 

These findings appear to be interesting enough to warrant further explora- 
tion of the proposed road to LRF. The basic questions that need to be 
answered are both quantitative and qualitative. Quantitatively, one needs 
stable statistics of regime persistence and of transition probabilities. 
These can be obtained at present only by careful and extended experimentation 
with general circulation models (GCMs). 

First, one needs to verify that a GCM produces essentially the same 
clusters as found in the data, and that the persistences and transition proba- 
bilities are equal to within sampling error to those in the data. Secondly, 
the GCM can be run for a sufficiently long period to obtain stable statistics. 
Third, one has to find how these statistics change when boundary data, such as 
sea-surface temperatures, are changed. Finally, the statistics obtained from 
the GCM have to be tested in a predictive mode. 

Qualitatively, one would like to know what generates the multiple regimes, 
and connects them by preferred paths. For instance, is it really true that 
particularly persistent sequences are generated by close passages near an 
unstable equilibrium, as the simple model here suggests? Are some of the 
preferred paths initiated by instabilities of such equilibria? What is the 
relative importance of barotropic and baroclinic instabilities in the "break" 
of a persistent flow pattern? We hope to find some of the answers to these 
questions in future work, observational, numerical and theoretical. 

APPENDIX A. CLUSTERING ALGORITHM 

This is the convergent version, proposed by A nderberg C1973, pp. 162-1631, 
of HacQueen' s 119671 Ic-means algorithm. It has the advantage of being rela- 
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tively inexpensive computationally, while still producing a partition close, at 
least locally in configuration space, to an optimum. Unfortunately, both conver- 
gence and optimality proofs are only available when Euclidean distance, rather 
than angle, are used for membership and separation criteria. We preferred to 
use, cf. Section 3, a modification of this algorithm tailored to synoptic expe- 
rience with the NH data set, rather than rely on theoretical results which might 
not provide the most significant clusters from a dynamic point of view. The 
proof of the pudding is in the eating, as we saw. 

The algorithm proceeds in two stages: (i) finding seed points for the k 

clusters, and (ii) iterating to optimize the partition. The first stage is 
essentially MacQueen's original algorithm, the second is essentially Anderberg's 
variant . 

(i) Seed points 

Step Ai. Take any map in the time series as point 1. 

Step A2. Proceed through the sequence, calculating the correlations p < <J> , c^) 

between any given map <}>(x, t) and existing centers of cluster c^,.., c . If 

p<4>, c k > > r^, then 4> is assigned to cluster and c^ is recomputed. If, on 

the other hand, p(<t>, c.) $ r„ for all c,, k = 1,..., m, then <J> is allowed to 

K Z K 

form a new cluster, <b = c ... If the exclusion criterion (7) is satisfied, then 

m+1 

<{> is assigned to the special, diffuse cluster. 

Step A3. Keep centers fixed, and make one pass through the data, assigning 
points <J> to existing clusters if p<<J>, c^) > r^ for some k, and to the diffuse 
cluster otherwise. 

<ii) Iteration 

Step B 1. Recompute the centers of clusters using current membership. 

Step B2. Compute the pattern correlations between pairs of centers. If 
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p(c., c. ) i r for all pairs, the algorithm terminates. If, on the other hand, 

J K Z 

p(c , c, )> r_ for a given pair, reassign all elements in the smaller cluster, 
Jo k G ^ 

according to step A2. 

Step B3. Repeat steps A3 through B2 until no more than N q points get reas- 
signed in the last step, and no clusters smaller than L q elements exist. 

N was taken equal to L , the small cluster criterion (see end of Section 
o M o 

4). The number of iterations necessary was ... for the model and ... for the 
NH data. 
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TABLE 1. Percentage of Variance Associated with each 
Empirical Uncertainty 


EOF, and 


its 


/ 


EOF Low Pass < 7 . ) 


Band Pass (7.) 


U 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


10.7 ± 1.1 

8.8 ± 0.9 
7.4 ± 0.8 
6.7 ± 0.7 
6.1 ± 0.6 

5.3 ± 0.5 

5.1 ± 0.5 

4.4 ± 0.4 

3.9 ± 0.4 

3.4 ± 0.3 

3.3 ± 0.3 

2.9 ± 0.3 

2.5 ± 0.2 

2.3 ± 0.2 

2.1 A 0.2 


(S0.1Z) 


<S8.5%) 


5.of± 0.3S 
4.85 ± 0.34 
4.26 ± 0.30 

4.09 ± 0.28 
3.00 ± 0.21 

3.00 ± 0.21 

2.49 ± 0.17 (24.27.) 

2.42 ± 0.17 

2.29 ± 0.16 

2.13 ± 0.15 (31.47.) 

2.10 ± 0.1S 

1.87 ± 0.13 

1.81 ± 0.13 

1.73 ± 0.12 

1.69 * 0.11 


( Total } lA.9y 0 42.79 

ft* — — — - — — 

Partial totals of variance are given for subsets of EOFs used in 
clustering computations. 


CjUj./tS' UjUjtLr' 
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TABLE 2. Clusters in Model Phase Space: Number of Elements and Persistence 

Cluster Number 7. of Total Number of Events . Average Persistence 

of Points Data of Given Duration M Times (x) 





1 

2 

3 

4 

5 

6 

7 

8 

9 

> 10 

T d 

T 

P 

T 

m 

1 

772 

11 

7 

7 

1 

3 

2 

1 

6 

0 

0 

9 

21.4 

40.8 

14 

2 

768 

11 

IS 

24 

17 

11 

9 

11 

IS 

4 

2 

20 

6.01 

10.0 

16 

3 

142 

2 

2S 

36 

11 

3 

0 

0 

- 

- 

- 

0 

1.89 

0 

9 

4 

121 

1.7 

66 

21 

0 

2 

1 

0 

- 

- 

- 

0 

1.3S 

S 

10 

S 

88 

1.2 

82 

3 

0 

- 

. - 

- 

- 

- 

- 

0 

1.04 

0 

12 

Total/Ave . 

1891 

27 

195 

91 

29 

19 

11 

12 

21 

4 

2 

29 

6.34 

10.2 

12.2 


cu*r**e tft. /&sS fat* Co/uhctts 

The yb ho mica i y persistence times^are average duration T d of all events in each 

cluster, the average duration T of all events lasting 5x or longer, and the 

P ( S 

average time T between the trajectory leaving a given cluster and reaching > 

^ A 4 

another cluster. ^estS eotuA A* 

S/>.c /)Pr . 





TABLE 3. Nearest Distance between the Points of a Flow Sequence in a Given 
Cluster, and jf*he Center of that Cluster 

A 

Cluster 


1 Persistence (t) 170 148 111 91 76 20 17 7 6 5 3 ' \ | 2 - 

d 0.45 0.48 0.50 0.90 0.68 1.1S 1.84 1.S0 1.78 1.94 t-fj 1 sz 


l 


Persistence (x) 34 33 23 19 18 17 16 14 12 9 


d , 0.82 0.70 0.87 1.05 1.06 1.04 0.89 1.20 1.18 1.23 

min 



2 (cont'd) 


1.17 1.32 1.40 1.41 1.58 1.60 1.72 


Values of d . 

min 


are nondimensional (see Legras and Ghil [1985], eqs. (1-4) and 
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TABLE 4. Fuzzy Clusters In Model Phase Space, Including Small-Anomally Cluster 


Cluster Number 7. of Number of Events Average Persistence 



of Points 

Data 

of 

Given Duration 

(-t) 





Times 

(x) 





1 

2 

3 

4 5 

6 

7 

8 

9 


>10 

a 

El 

T 

*W 

1 

1222 

18 

31 

20 

11 

7 

9 

6 

1 

8 

5 

16 

10.7 

24.2 

4.9 

2 

1384 

20 

41 

17 

13 

7 

7 

0 

1 

3 

1 

18 

10.9 

3S.3 

4.6 

3 

462 

6.6 

40 

SO 

40 

2S 

18 

2 

0 

- 

- 

0 

2.6 

5.1 

3.7 

4 

S30 

7.S 

83 

113 

47 

9 

3 

2 

0 

1 

1 

0 

2.1 

6.3 

3.0 

S 

384 

5.4 

86 

130 

5 

3 

2 

0 

- 

- 

- 

0 

1.7 

0 

3.5 

6 

333 

4.8 

62 

42 

21 

16 

3 

3 

1 

0 

0 

1 

2.2 

6.6 

3.4 

Total/Ave 

. 4315 

62 

343 

372 

137 

67 

42 

13 

3 

12 

7 

35 

5.03 

12.9 

3.85 


See Table 2 and text for definition of T 


d’ 


T and T, 
P 


T J ■ 


L 


S.c . 





TABLE 5. 


From 


1 

2 

3 

4 

5 

6 




Transition between Pairs of Clusters 

A 


To 


31 

0 

54 

S 

2 


2 

3 

4 

5 

6 

Sum 

Average 

± 

Std. 

0 

18 

42 

16 

6 

113 

19 

± 

1.33 

20 

4 

14 

47 

23 

108 

18 

± 

1.53 

10 

11 

27 

4 

69 

175 


± 

1.83 

18 

40 

137 

48 

11 

259 

A 

43.1 

± 

2.78 

45 

91 

38 

30 

20 

226 

37.6 

± 

1.92 

IS 

11 

1 

80 

20 

149 

24.8 

± 

3/7 


' fn>* 

A A 


/ 

/ frkJL 

The numbers which are significantly larger than those given by equal 


//</ /cS 'Js uj nr fVf /oJo 


probabilities are indicated by £>oldfac^. Significant is taken as average number 
of transitions, plus one standard delation, e.g., 20.3 for transitions from 
Cluster 1, or 19.5 from Cluster 2. 
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TABLE 6' Statistics of Clusters in Lou-Pass Filtered Data ft 


jCa/> £ 


ster No. of elements 
and percentages 

( days ) 
(7.) 




Average. Persistence Q V 
Times (Days) Cluster 

Hard 
No. 7. 

Fuzzy 
No. 7. 

Projection along 
12 3 4 

EOF axis 
5 6 

7 

T d 

T 

P 

T 

w 

No. 

% 

6S 2.68 

1S0.S 

6.2 

-8.9 

-1.0 2.9 -3.3 

-2.6 -2.7 

2.9 

6.3 

9.9 

7.9 

130 

U.Y 

57 2.35 

139 

S . 7 

7.4 

3.2 -6.0 -1.2 

-0.3 0.8 

-0.5 

6.3 

8.6 

11.6 

95 

413 

54 2.23 

161. S 

6.6 

-6.6 

4.4 -5.4 -2.4 

2.1 2.6 

1.6 

8.1 

11.3 

7.7 

136 

ibz 

40.5 1.67 

86.5 

3.6 

-6.3 

-7.9 -0.2 7.4 

-1.3 -3.4 

-3.6 

7.0 

9.3 

12.3 

75 

U‘7 

38 1 . 57 

130. S 

5.3 

6.7 

-4.7 6.6 -2.4 

2.2 0 

-1.0 

S.2 

7.9 

9.5 

71 


36 1.48 

86. S 

3.6 

4.3 

l.S -3.6 7.2 

2.7 -8.1 

1.3 

5.2 

9.8 

14.0 

59 

at. -u 

36 1.48 

116. S 

4.8 

2.1 

S . 4 7.9 l.S 

1.4 -5.0 

0.9 

5.1 

8.9 

9.S 

54 

hC.l* 

27 1.11 

68 

2.8 

1.6 

0.9 2.4 -3.1 

3.9 -0.2 

4.6 

3.8 

7.7 

6.4 

23 

33.1 

66 2.72 

66 

2.7 

0 

0 0 0 

0 0 

0 

2.4 

(%0) 6.4 

18 

ZJ-3 


:al 419S 17.3 100S 41.3 Ave S.49 8.71 9.48 661 


2 number of elements is listed in days, and 0.5 indicates that only one map out of two for a 
ven day Is in the cluster. Last two columns give the number of days of QS events within each 
uster, and the corresponding percentage. 
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